Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

نویسندگان

Dan Hendrycks

Mantas Mazeika

Duncan Wilson

Kevin Gimpel

چکیده

The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling for large datasets, non-expert labeling, and label corruption by data poisoning adversaries. In the latter case, corruptions may be arbitrarily bad, even so bad that a classifier predicts the wrong labels with high confidence. To protect against such sources of noise, we leverage the fact that a small set of clean labels is often easy to procure. We demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a dataefficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels

Recent studies have discovered that deep networks are capable of memorizing the entire data even when the labels are completely random. Since deep models are trained on big data where labels are often noisy, the ability to overfit noise can lead to poor performance. To overcome the overfitting on corrupted training data, we propose a novel technique to regularize deep networks in the data dimen...

متن کامل

How Do Neural Networks Overcome Label Noise?

This work provides an analytical expression for the effect of label noise on the performance of deep neural networks. (a) 5 of MNIST’s 10 classes, with clean labels (b) 20% Random Noise, 100% Network Prediction Accuracy (c) 20% Randomly Spread Flip Noise, 100% Accuracy (d) 20% Locally Concentrated Noise, 80% Accuracy Figure 1: Different types of random label noise. DNNs are extremely resistant ...

متن کامل

Credit Risk Measurement of Trusted Customers Using Logistic Regression and Neural Networks

The issue of credit risk and deferred bank claims is one of the sensitive issues of banking industry, which can be considered as the main cause of bank failures. In recent years, the economic slowdown accompanied by inflation in Iran has led to an increase in deferred bank claims that could put the country's banking system in serious trouble. Accordingly, the current paper presents a prediction...

متن کامل

Adaptive Filtering Strategy to Remove Noise from ECG Signals Using Wavelet Transform and Deep Learning

Introduction: Electrocardiogram (ECG) is a method to measure the electrical activity of the heart which is performed by placing electrodes on the surface of the body. Physicians use observation tools to detect and diagnose heart diseases, the same is performed on ECG signals by cardiologists. In particular, heart diseases are recognized by examining the graphic representation of heart signals w...

متن کامل

روشی جدید برای عضویت‌دهی به داده‌ها و شناسایی نوفه و داده‌های پرت با استفاده از ماشین بردار پشتیبان فازی

Support Vector Machine (SVM) is one of the important classification techniques, has been recently attracted by many of the researchers. However, there are some limitations for this approach. Determining the hyperplane that distinguishes classes with the maximum margin and calculating the position of each point (train data) in SVM linear classifier can be interpreted as computing a data membersh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.05300 شماره

صفحات -

تاریخ انتشار 2018

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

نویسندگان

چکیده

منابع مشابه

MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels

How Do Neural Networks Overcome Label Noise?

Credit Risk Measurement of Trusted Customers Using Logistic Regression and Neural Networks

Adaptive Filtering Strategy to Remove Noise from ECG Signals Using Wavelet Transform and Deep Learning

روشی جدید برای عضویت‌دهی به داده‌ها و شناسایی نوفه و داده‌های پرت با استفاده از ماشین بردار پشتیبان فازی

عنوان ژورنال:

اشتراک گذاری